(ACM SRC Poster) PEAK: Parallel EM Algorithm using Kd-tree
نویسنده
چکیده
The data mining community voted Expectation Maximization (EM) algorithm as one of the top ten algorithms having the most impact on data mining research [5]. EM is a popular iterative algorithm for learning mixture models with applications in various areas from computer vision, astronomy, to signal processing. We present a new high-performance parallel algorithm on multicore systems that impacts all stages of EM. We use tree data structures and user-controlled approximations to reduce the asymptotic runtime complexity of EM with significant performance improvements. PEAK utilizes the same tree and algorithmic framework for all the stages of EM. Experimental results show that our parallel algorithm significantly outperforms the state-of-the-art algorithms and libraries on all dataset configurations (varying number of points, dimensionality of the dataset, and number of mixtures). Looking forward, we identify approaches to extend this idea to a larger scale of similar problems.
منابع مشابه
On Speeding up the Em Algorithm in Pattern Recognition: a Comparison of Incremental and Multiresolution Kd-tree-based Approaches
Finite mixture models implemented via the EM algorithm are being increasingly used in a wide range of problems in the context of unsupervised statistical pattern recognition. As each E-step visits each feature vector on a given iteration, the EM algorithm requires considerable computation time in its application to large data sets. We consider two approaches, an incremental EM (IEM) algorithm a...
متن کاملEstimating Suspended Sediment by Artificial Neural Network (ANN), Decision Trees (DT) and Sediment Rating Curve (SRC) Models (Case study: Lorestan Province, Iran)
The aim of this study was to estimate suspended sediment by the ANN model, DT with CART algorithm and different types of SRC, in ten stations from the Lorestan Province of Iran. The results showed that the accuracy of ANN with Levenberg-Marquardt back propagation algorithm is more than the two other models, especially in high discharges. Comparison of different intervals in models showed that r...
متن کاملOn some Variants of the EM Algorithm for the Fitting of Finite Mixture Models
Finite mixture models are being increasingly used in statistical inference and to provide a model-based approach to cluster analysis. Mixture models can be fitted to independent data in a straightforward manner via the expectation-maximization (EM) algorithm. In this paper, we look at ways of speeding up the fitting of normal mixture models by using variants of the EM, including the so-called s...
متن کاملBlocking in Parallel Multisearch
External memory (EM) algorithms are designed for computational problems in which the size of the internal memory of the computer is only a small fraction of the problem size. Block-wise access to data is a central theme in the design of eecient EM algorithms. A similar requirement arises in the transmission of data between processors in certain parallel computation models such as BSP* and CGM, ...
متن کاملHighly Parallel Fast KD-tree Construction for Interactive Ray Tracing of Dynamic Scenes
We present a highly parallel, linearly scalable technique of kd-tree construction for ray tracing of dynamic geometry. We use conventional kd-tree compatible with the high performing algorithms such as MLRTA or frustum tracing. Proposed technique offers exceptional construction speed maintaining reasonable kd-tree quality for rendering stage. The algorithm builds a kd-tree from scratch each fra...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015